Cross database training of audio-visual hidden Markov models for phone recognition

نویسندگان

  • Shahram Kalantari
  • David Dean
  • Houman Ghaemmaghami
  • Sridha Sridharan
  • Clinton Fookes
چکیده

Speech recognition can be improved by using visual information in the form of lip movements of the speaker in addition to audio information. To date, state-of-the-art techniques for audio-visual speech recognition continue to use audio and visual data of the same database for training their models. In this paper, we present a new approach to make use of one modality of an external dataset in addition to a given audio-visual dataset. By so doing, it is possible to create more powerful models from other extensive audio-only databases and adapt them on our comparatively smaller multi-stream databases. Results show that the presented approach outperforms the widely adopted synchronous hidden Markov models (HMM) trained jointly on audio and visual data of a given audio-visual database for phone recognition by 29% relative. It also outperforms the external audio models trained on extensive external audio datasets and also internal audio models by 5.5% and 46% relative respectively. We also show that the proposed approach is beneficial in noisy environments where the audio source is affected by the environmental noise.

منابع مشابه

Bimodal speech recognition using coupled hidden Markov models

In this paper we present a bimodal speech recognition system in which the audio and visual modalities are modeled and integrated using coupled hidden Markov models (CHMMs). CHMMs are probabilistic inference graphs that have hidden Markov models as sub-graphs. Chains in the corresponding inference graph are coupled through matrices of conditional probabilities modeling temporal influences betwee...

متن کامل

Labeling audio-visual speech corpora and training an ANN/HMM audio-visual speech recognition system

We present a method to label an audio-visual database and to setup a system for audio-visual speech recognition based on a hybrid Artificial Neural Network/Hidden Markov Model (ANN/HMM) approach. The multi-stage labeling process is presented on a new audiovisual database recorded at the Institute de la Communication Parlée (ICP). The database was generated via transposition of the audio databas...

متن کامل

Cheap Bootstrap of Multi-Lingual Hidden Markov Models

In this work we investigate the usage of TV audio data for cross-language training of multi-lingual acoustic models. We intend to take advantage from the availability of a training speech corpus formed by parallel news uttered in different languages and transmitted over separated audio channels. Spanish, French and Russian phone Hidden Markov Models (HMMs) are bootstrapped using an unsupervised...

متن کامل

Product HMMs for audio-visual continuous speech recognition using facial animation parameters

The use of visual information in addition to acoustic can improve automatic speech recognition. In this paper we compare different approaches for audio-visual information integration and show how they affect automatic speech recognition performance. We utilize Facial Animation Parameters (FAPs), supported by the MPEG-4 standard for the visual representation as visual features. We use both Singl...

متن کامل

An Asynchronous Hidden Markov Model for Audio-Visual Speech Recognition

This paper presents a novel Hidden Markov Model architecture to model the joint probability of pairs of asynchronous sequences describing the same event. It is based on two other Markovian models, namely Asynchronous Input/Output Hidden Markov Models and Pair Hidden Markov Models. An EM algorithm to train the model is presented, as well as a Viterbi decoder that can be used to obtain the optima...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

متن کامل
عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2015